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A binary renewal process is a stochastic process {X n } taking val- 
ues in {0, 1} where the lengths of the runs of l's between successive 
zeros are independent. After observing Xq,Xi, . . . ,X n one would like 
to predict the future behavior, and the problem of universal esti- 
mators is to do so without any prior knowledge of the distribution. 
We prove a variety of results of this type, including universal esti- 
mates for the expected time to renewal as well as estimates for the 
conditional distribution of the time to renewal. Some of our results 
require a moment condition on the time to renewal and we show by 
an explicit construction how some moment condition is necessary. 

1. Introduction. The classical binary renewal process is a stochastic pro- 
cess {^n} taking values in {0, 1} where the lengths of the runs of l's between 
successive zeros are independent. These arise, for example, in the study of 
Markov chains since the return times to a fixed state form such a renewal 
process; cf. [7]. (More details on this will be given in the next section.) In 
many applications, the occurrences of a zero, which represent the failure 
times of some system which is renewed after each failure, are of importance 
and so the problem arises of estimating when the next failure will occur; cf. 
Example 12.13 in [8]. 

Our purpose in this paper is to investigate the possibility of giving a uni- 
versal estimator at time n for the residual waiting time to the next zero in 
the binary renewal process {X n }. Let {pk}kLo be the conditional probability 
that a run of k l's follows a given event. This distribution describes com- 
pletely the renewal process as a two-sided stationary process. In order that 
the probability of Xq = be nonzero it is necessary that \i = J2hLo ^Pk < 00 
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and then P(Xq = 0) = 1/(1 + /i) is positive. (This relation between the mean 
of the conditional renewal distribution and the stationary probability of the 
renewal event is well known in ergodic theory as Kac's formula for the ex- 
pected return time to a set, and in probability theory cf. [7], Chapter XIII 
and [28], Section I.2.c.) If the process distribution is known, then after ob- 
serving Xq,X\, . . . , X n one may give a consistent estimator for the expected 
value of residual waiting time to the occurrence of the next zero as 

_ T,k=L&-L)p k 

if there is at least one zero among the values of Xq , X\ , . . . , X n and the 
last zero occurs at moment X n _L = 0. (Indeed, if X n ^L = and for all 
n — L < i < n, Xj = 1, then for k > L the probability that for all n + 1 < 
i < n — L + k + 1, Xi = 1, and X n+ k-L+i = is — .) We denote this 

L by t(Xq,Xi, . . . ,X n ). Similarly we define r = t(X^_ 00 ) as that t > such 
that X^t — and Xi = 1 for all — t < i < 0. It is clear from the stationarity 
that P(r = L) is proportional to Ylk=LPk an d thus for the finiteness of 
the unconditional expectation of the residual waiting time we would have 
to demand that J2k^=o^ 2 Pk < oo. We shall not assume this since we are 
interested primarily in the conditional expectations and with probability 1 
for n sufficiently large at least one of the Xi = 0, for < i < n and for any 
fixed value of t(Xq, . . . ,X n ) = L <n the expected residual waiting time is 
fj,L < oo. This of course is well known in the classical analysis of renewal 
processes. In the spirit of recent investigations into universal estimators for 
various features of stationary processes (see [1, 2, 3, 6, 10, 13, 14, 17, 23, 
24, 25, 29, 30]) we take up here the problem of how well can we do when all 
that we know is that the binary process is in fact a renewal process. 

The fact that we are trying to estimate the time to next occurrence of zero 
rather than X n+ \ takes us out of the framework of previous investigations. 
In earlier works such as [11] attention is restricted to those renewal processes 
which arise from Markov chains with a finite number of states. In that case 
the probabilities pk decay exponentially and one can use this information in 
trying to find not only the distribution but even the hidden Markov chain 
itself. Since we are considering the general case where the number of hidden 
states might be infinite, this exponential decay no longer holds in general 
and the problem becomes much more difficult. 

For the estimator itself it is most natural to use the empirical distribution 
observed in the data segment Xq, Xi, . . . , X n . However, if there were an insuf- 
ficient number of occurrences of 1-blocks of length at least t{Xq, X\ , X n ) , 
then we do not expect the empirical distribution to be close to the true 
distribution. In particular, if no block of that length has occurred yet, 
clearly no intelligent estimate can be given. For this reason we will estimate 
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only along stopping times Ai,A2, ■•■ and our main positive result is that 
there is a sequence of universally defined stopping times A n with density 1 
and estimators h n (Xo, X\, . . . , X\ n ) which are almost surely converging to 
I jl t(x ,x 1 ,...,x x )■ (F° r further reading on estimation along stopping times see 
[12, 15, 16, 18, 21, 22].) We also will define estimators p l (X ,X 1 , . . . ,X Xn ) 
which are almost surely converging in the variation metric to the conditional 
distribution of the residual waiting time. These results will require a suit- 
able higher moment condition on the {pk} distribution. These estimators 
are simply the averages of what we observe in a piece of the data segment 
X Kn , . . . , X\ n where K n is chosen so that there is a large fixed number of 
occurrences of the relevant pattern. The reason for these stopping times A n 
is that we want to estimate only at those times when we feel that we have 
enough data. 

Another kind of result may be obtained without a higher moment condi- 
tion. Namely, there is a sequence of estimators h n and p n such that for any 
renewal process and almost every sequence of observations Xq,X\, . . . there 
is a sequence of density 1 of n's D, which depend on the observed sequence 
of Xi along which these estimators converge to the [i T and conditional distri- 
butions of residual waiting times. The difference is that now we are unable 
to determine what these sequences are by finite observations. 

On the other hand, for stopping times of density 1 we will show that no 
such result is possible in general, that is, without higher moment assump- 
tions. More precisely, there is no strictly increasing sequence of stopping 
times {A n } with density 1, and sequence of estimators {h n (Xo, . . . ,X\ n )}, 
such that for all binary classical renewal processes 

limsup|/i n pf , . . . ,X Xn ) - Mt(X ,...,X a )l = almost surely. 

(For results of similar vein see [4, 9, 19, 20, 27].) 

In spite of this negative result, without any condition on higher moments, 
we can find stopping times with density close to 1 along which we converge to 
the estimates that are possible with full knowledge of the system. That is to 
say, for any e > there exists a sequence of stopping times {An estimators 
{hn\Xo, . . . , X and {pf (Xq, . . . , X ,( S ))} such that the density of the 
stopping times is greater than 1 — e and almost surely these estimators 
converge to \i r and the conditional distributions of residual waiting times, 
respectively. 

2. Results. It is easiest to formally define a renewal process in terms of 
an underlying Markov chain. Consider a Markov chain on the state space 
{0, 1,2, . . .} with transition probabilities Pi t i-i = 1 for alH > 1 and po,i = Pi 
a probability distribution it on {0,1,2,...}; cf. [8], Example 12.13. This 
chain is positive recurrent exactly when J2i^o iPo,i = M < 00 an d the unique 
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stationary probability assigns mass j^— to the state 0; cf. [7], Chapter XIII 
and [28], Section I.2.c. Collapsing all states i > 1 to 1 gives rise to the 
classical binary renewal process. Even though our primary interest is in one- 
sided processes, stationarity implies that there exists a two-sided process 
with the same statistics and we will use the two-sided version whenever it 
is convenient to do so. 

For conciseness, we will denote X\ = (Xj, . . . ,Xj) and also use this nota- 
tion for i = — oo and j = oo. Our interest is in the waiting time to renewal 
(the state 0) given some previous observations, in particular given Xq . Re- 
call that if the data segment Xq does not contain a zero, the expected time 
to the first occurrence of a zero may be infinite; this depends on the finite- 
ness of the second moment of ir. If a zero occurs, then the expected time 
depends on the location of the zero and so we introduce the notation: 

r(X" oc ) = the t > such that X n _ t = 0, and X{ = 1 for n — t < i < n. 

Note that this is well defined with probability 1. If a zero occurs in Xq , then 
r(X" 00 ) depends only on Xq and so we will also write for r(X" oc ), t(Xq) 
with the understanding that this is defined only if a zero occurs in Xq . 
Now for the classical binary renewal process {X n } define n as 

n = £(max{0 < k :Xi = 1 for all n< i < n + k}\Xfi). 

(Note that n = ^reg fcPfc+r(XQ ' ,, ' ,A ' n) as soon as there is at least one zero in 

Z^k=T(x ,...,x n )P k 

Xq. As we have already mentioned, if no zero occurs, then it might happen 

that n = oo.) For a family of processes {X n ^} we use the notation On ■ Our 
goal is to estimate both n and the distribution of the time to renewal given 
Xq but without prior knowledge of the distribution function of the process. 
Define ip as the position of the first zero, that is, 

^) = min{i > 0:X t = 0}. 

Let < 7 < 1 be arbitrary. First define the stopping times A n as Ao = ip and 
for n > 1, 

A n = min{fc > A n _i :\{ip <i <k: r(X l ) = r(X$)}\ > /c 1 " 7 }. 

These are the successive times i when the value t = t(Xq) has occurred 
previously enough times so that we can safely estimate the residual renewal 
time by empirical distributions derived from observations already made. We 
also need to fix K n as the index where reading backward from X\ n will have 
seen for the first time > A^~ 7 occurrences of an i with t(Xq) = r(X n ). 
Formally put 

Kn =max{iT: \{K < k < A n : r(X fe ) = r(X^)}\ = [A,^ 7 ]}. 
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Define tTj as the length of runs of l's starting at position i. Formally put 

<7j = max{0 < I : Xj = 1 for i < j < i + 1}. 
For n > define our estimator h n (Xq , . . . , X Xn ) at time A n as 

^ A n — l 

/l„(X , . . . , X A J = |-( An )l_ 7 -| S / {r(X«)=r(Jf^)} a '<- 

[Notice that the role of K n is rather technical. It ensures that we take into 
consideration exactly [(An) 1 " 7 ] pieces of occurrences.] The above formula 
is simply the average of the residual waiting times that we have already 
observed in the data segment X^ n when we were at the same value of r 
as we see at time A n . In a similar fashion we can define the average of 
the number of times that the residual waiting time assumed a fixed value. 
Namely, define pi(Xo, . . . , X\ n ) for each / as 



Y An ~ 1 



pi(x , ...,x Xn )= 



Note that p\ (Xq , . . . , X\ n ) is a probability distribution on the nonnegative 
integers. 

Theorem 1. Assume J2k=o^ a+1 Pk < oo for some a > 2. Let < 7 < 
min(l — 2/a, 1/3). Then for the stopping times A n and the estimator h n (Xo, . . 
X\ n ), Pi(Xq, . . . , X\ n ) defined above, almost surely, 

(1) lim ^ = 1, 

n— >oo fi 

(2) lim \h n (X ,...,X Xn )-e Xn \=0 

n — >oo 

and 



(3) lim V 



n— too 

1=0 



pi(Xo,...,X Xn ) 



^i=r(X^) Pi 



0. 



Note that pi(Xq, . . . ,X Xn ), h n and A n depend on 7 and so on a. 

In order to reduce our assumption from a > 2 to a > 1 a slightly more 
involved scheme of stopping times is needed. 

Let < 7 < 1 be arbitrary. First define the stopping times A* as Aq = ip 
and for n > 1, 

A* = min{i > A*_ x : 3ip < i < log t such that t(Xq) = t(Xq) 

and |{logt <i<2L lo s*J . T ( X i) = r (X^)}| > 2 L lo g*J (^t) }. 
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(Note that all logarithms are to the base 2.) Put 

< = mm{K: \{[log\* n \ <j < K : T {X>) = r(X^)}\ = ^^M]}. 

Note that «* < 2L lo s A «J . For n > define our estimator h* n (X , X\* n ) at 
time A* as 

h* n (X , X K ) = r2LlogA , J(1 _ 7)] E \ T{xl)=T{x K ) fi- 

«=LlogA*J+l 

(Notice that k* ensures that we take into consideration exactly \2^ og A ?*J( 1- ')')] 
pieces of occurrences.) The above formula is simply the average of the 
residual waiting times that we have already observed in the data segment 

K* 

X,, n x * i , ! when we were at the same value of r as we see at time A* . 

L l0 g A nJ+ i '* 

Note that h* n (Xq, . . . , X\* ) is by far not as efficient as h n (Xo, . . . , X\ n ) since 
as long as 2 m < A* < 2 m+1 the estimator /i*(Xo, . . . ,X\*^ ) is not refreshed. 
Keeping the same estimate for many values of n enables us to use weaker 
moment assumptions since the number of unfavorable events that we have 
to consider is reduced. 

In a similar fashion we can define the average of the number of times that 
the residual waiting time assumed a fixed value. Namely, define p* (Xq, . . . , X\* 
for each I as 



K n 



Pf(X ,...,X K ) r 2LlogA ,j (1 _ 7)l E I {r{X i )=T{X *n ) l} - 

\* I i=LlogA*j+l 



Note that p* (Xq , . . . , X\* ) is a probability distribution on the nonnegative 
integers. 

Theorem 2. Assume YHk=o^ a+1 Vk < oo for some a > 1. Let < 7 < 
1/3. Then for the stopping times A* and the estimator h* n (Xq, . . . , X\* ), 
p[ (Xq , . . . , X\* n ) defined above, almost surely, 

(4) lim ^ = 1, 

n— >oo n 

(5) \im\K(XQ,...,X K )-6 K \ = 
and 



(6) lim Y 



n^oo 

1=0 



pt(x ,...,x K )- ^ T{x ^ 



Note that neither h* n , p\(Xq, . . . ,X\*) nor A* depend on a. 
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The main point in the above theorems is that we eventually know when 
the error is small. If we do not want to know this, then the moment condition 
can be dropped as is exhibited in the next theorem. 

Define the estimator h n (XQ) as 



h n {XQ 



En— l _ j 



\{^<i<n-l:r(X^=r(X^}\ 

This is just the average of values of 9i for the data segment Xq for those 
indices i for which t{Xq) = t(Xq). 
Define also pi(Xq) as 



j=4> J {T(Jfg)=T(Xff), g< =i} 

|{^<i<n-l:r(X*)=r(X£)}| 



Theorem 3. For any binary renewal process {X n }, and almost every 
sequence of observations Xq° , there is a set of indices D(Xq°) C {0, 1, . . .} 

|D(Xg°)n{0,l,...,n}| 
n+l 



such that lim*. 



(7) 

and 

(8) 



lim 

raeZ>(X°°),n->oo 



1 and 
\h n (X , . . .,X n ) 







lim >^ 



However, for stopping times we need some restrictions to achieve consis- 
tency on density 1 as is showed in the next theorem. 



Theorem 4. For any strictly increasing sequence of stopping times {X n } 
and sequence of estimators {h n (Xo, . . . , X\ n )}, such that for all binary clas- 
sical renewal processes limn^oo ^ = 1 almost surely, there exists a binary 
classical renewal process such that 

P Aim sup | h n (Xo , • • • , X\ n ) — 0\ n \> o) > 0. 

\ n— >oo / 

We do not know if a similar result can be formulated for the estimation 
of the distribution of the residual waiting times in total variation. 

Finally, if one merely intends to predict along a stopping time with density 
greater than 1 — e for some fixed e > 0, then no condition on higher moments 
at all is required as it is stated in the next theorem. Let = if) and for 
n > 1 define 

A(f ) = min{i >>£± 1 :\ty<i<t: t(X*) < r(^)}| < t(l - e/2)}. 
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This sequence of stopping times is designed so that eventually we only stop 
when t(Xq) takes values bounded by some finite L. The point is that if 
L is large enough, then eventually the density of times i when t(Xq) < L 
will be greater than (1 — e/2) so that our stopping times will choose only 
moments that are less than L. On the other hand, the fact that eventually 
the t(Xq) < L's will enable us to prove the convergence of the empirical 
estimators by a direct application of the ergodic theorem. 
Define the estimator hn(Xo, . . . ,X.( S )) at time An a s 

^-1 



Ei^0 1 X W °i 
h {e)( X X \ - {r(X 5 )=r(X " )} 
n>n [X Q , ■ ■ ■ , X (e) ) - — -m — . 

" \{^< l <\^:r{Xi) = T{X^ )}| 



Also define 



i=ip 1 



Je)( Y Y ^- {r(X*)=r(X " ),* 1 =l} 

Pi [Xq,...,X ■ 



^' \{^<i<\ ( i ) :r{X^=T{xf)}\ 
Theorem 5. For the stopping times An and estimator hn (Xq, . . . , X ( e ) ' 



defined above, almost surely. 



n 

liminf — ^ > 1 — e 

n—*oo 

An 



limsup|/i^(Xo, . . . ,X 



(s)l - V ( £ ) | 



and 



limsup^ 



Pi [Xo,...,x ■ 



J2il T (X")Pi 



0. 



3. Proof of Theorem 1. It is easy to see that limn—xx, 4? = 1 since if a 
block of l's has positive probability it will appear with that frequency which 

is eventually greater than -| — (which tends to zero). Formally, 

,. . r n ^ . . max{i > : Aj < iV} 
lim mi — > hm mi . 

n^oo \ n N->oo N 

Thus we have to see why the density of times when we stop and estimate 

,1-7 

tends to 1. Since the cutoff -J — tends to zero, any positive probability event 
will eventually be greater than it and so for any bounded K we will have 

liminfmm { i >0:r(.^)< K .A,< J V} = < 
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As K tends to oo this last expression tends to 1 and thus 

1 > limsup-^- > liminf-^- > P(r(X° 00 ) < oo) = 1. 

This establishes (1). 

The usual proof of the weak law of large numbers for independent and 
identically random variables {Z n } with a second moment uses Chebyshev's 

inequality P(\ E?=i(^i ~ EZ i)\ > ne ) < ^ E (( Z i ~ EZ \f)- We wil1 need a 
sharpening of this for random variables with an ath moment for a > 2. 

It will be convenient to extend our process, as we may to the past, and 
establish first an inequality for an estimator based on an unlimited past. For 
a given fixed k, for i > define j\ as the ith occurrence of t(X^_ qo ) (reading 
backward) from position k, that is, 

= max{j < k : \{j < I < k : r(Xi 0O ) = r(x! 00 )}| = i}. 

Now for % > define 



Z 



(k) 



(k) 

Clearly Z\ are conditionally independent and identically distributed given 



t(^oc) 

get that 



L. Apply Markov inequality and Theorem 2.10 of Petrov [26] to 



P 



^k 1 -^] 7 {k) 



< 



2C(a) 



Y%=LPh 



> e 



r(X h _ 



L 



rafc(l-7)«/2 



T,h=LPh 



where C{a) depends only on a. [Notice that E(lZ ° ^^ x -^- L ^h=o ha Ph+i 
Multiply both sides of the last inequality by P(t(X_ 00 ) = L) - 



Y^h=LPh (note that by Kac's theorem P(Xk~L = 0) 



cf. [7], Chapter XIII and [28], Section I.2.c) and sum over L. It is easy to 
see that 



E 

L=0 



Eh=LPh 



l + Eh=ohp h 



< 



Eh= h a+1 Ph 



and we get the following estimate: 



ik 1 - 



Eoo 



Ph 



>e < 



2C(a) Er= h a+1 Ph 
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Applying the Borel-Cantelli lemma [by assumption — ^ a > 1] one gets that 



< £ 



eventually almost surely. Particularly, on the subsequence X n , 



[(An) 



I-7-1 



Eoo 



< e 



eventually almost surely. Since T(X_ n oa ) = t(X "), h n {X { 



An \ —z' V^n\ U { vA n \ — 1 i 



and 



J2h=0 hp h+T (x^ n ) 



[(An) 



1-71 



Era 
h=r{X^ n ) Ph 



, we get that 



\h n (X , • • -i^aJ -0\ n \<£ 

eventually almost surely which since e was arbitrary gives (2). 
For (3) observe that 



E 

(=0 



r(A„)TlogA„l-l 

E 

1=0 



+ E 

«=r(An)TlogA„ 



Xn ^ Pl+r(X^) 

^-'i=T(Xg n ) P% 



First we deal with the first term. We will use finite sums of exponential 
bounds in order to bound it. Now define 

z i,i - 2 W nk =i}- 



(k) 

Clearly are conditionally independent and identically distributed given 
r{X k i 00 ) = L. Apply Hoefiding's inequality to get that 



P 



^Ik 1 -*] 7 (k) 



Pl+L 



< e -k 1 -y(2k^(logk)- i )^ 



>k-\\ogk)~ 2 T{X k _ OQ ) = L 
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After integrating both sides with respect to the conditioning, and using the 
sum bound on the events for < I < [~/c 7 log A;] — 1, we get 



P[ max 

v o<«<r^ iogfei-i 



Ei=l Z ll Pl+riXl^) 



>fc- 7 (logfc)- 



<rFlogA;le- fc " 7 /( 2fc27 ^ fc ) 4 ) 
which is summable (by assumption 7 < 3 ) and so by the Borel-Cantelli 



lemma, 



max 

0<l<\k~i log fc"|-l 



<k"^{\ogky 



eventually almost surely. Particularly, on the subsequence X n , 



r(A n )^logA n l-l ^[(An) 1 " 7 ! z (\ n ) 



E 

1=0 



i.l 



[(An) 1 ' 7 ! 

< r(A„) 7 (logA ra )lA n - 7 (logA n )- 2 



^ l h > =r(X^ 00 ) Ph 



eventually almost surely. Observe that r(X^^ G ) = r(XQ n ) and pi(Xq" 
(9) A 



and so we get that 
r(A„)Tio g A„l-l 



E 



=0 



E 



i=r(X A ") Pl 



< 



log A n 



eventually almost surely. We have to prove that B n — > almost surely. Note 
that by the Markov inequality, given t(Xq) = L, for L < k, 

T,iZ\^ L \ og k-\Pl+L < 1 



(10) 

Hi=lPi lo §^ 

where \i L = £ 4 = L (i - L )Pil EZLlPi- 

Now observe that almost surely for sufficiently large n, 

(11) ^r(X^) ^ ( A «) 7 - 

Indeed 



h n (X 



A n 1 



1 



f(An 



,1-7; 



An-1 

E 1 



An " [(An) 1 " 7 ! 



{t(X* )=t(X^)} 



[(An) 



1-71 



<(A„r-i 



[in the data segment X n there are at least |~(A n ) 7 ] zeros] and we have 
already proved that h n (XQ™) — ^j x a„j — ► 0. 
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Now, apply (11), (10) and the upper bound on A n in (9) in order to get 

r(A„)^logA„l-l 

<1- Y, MXo n ) + 



< 1 



1=0 

["(Analog A„l-1 

E 

(=0 



logA r 



+ 



r(A n )TlogA n l-l 

E 

1=0 



Pi(X { 



X n \ P l+r(X^) 



+ 



log \ Tl 



< 



log n 

eventually almost surely, and so B n —* almost surely. The proof of Theo- 
rem 1 is complete. 

4. Proof of Theorem 2. The proof is similar to that of Theorem 1 but 
with a number of changes required to deal with the weaker hypothesis. It 

A* 

is easy to see that lim^—yco -f = 1 since if a block of l's has positive prob- 
ability it will appear with that frequency which is eventually greater than 
2 LiogA j(i 7) ^jjjgjj tends to zero). Formally, 

,. . . n ^ ,. . . max{i > : A£ < N} 
lim 1111 - — > lim mi 

n— oo A* N^oo N 



> liminf 

N— >oo 



max{i > : t(Xq 1 ) < K, A* < N} 



N 



= P(t(X _ oo )<K) 
for arbitrary large K. Thus 



1 > lim sup ^- > liminf^- > PMX°) < 00) = 1. 

n — >oo A* 



n— >oo A^ 



Let k < m be fixed. Define = m and for i > let j^" 1 ' denote the 

(i + l)st occurrence of t(X_ 00 ) (reading forward, starting at position m), 
that is, 



■ (k,m) 



■ (k,m) 



■ rj. ^ •(k,m) 1 v 
mm{i > J- ■ t(X 



riXl^)}. 
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Now for i > 1 define 



Clearly z\ k are conditionally independent and identically distributed 
given r{X k i 00 ) = L. For 1 < a < 2 apply Markov inequality and Theorem 
2 of von Bahr and Essen in [5] to get that 



rytk.m) 

Z\ >=a (k , m) . 



^[(2™)^! z (k,m) 



E£U h Ph+L 



< 



10 



Eh=LPh 

Er= h a p h+ L 



> e 



r{X k _ 00 ) = L 



e a(2 m ) < - 1_7 ^ Q ~ 1 ' ) 

[Notice that E(\z[ k ' m) I^TpC*^) 



Y.hLhPh 



L) = Ph+L .} Multiply both sides 

of the last inequality by P(t(X i 1 00 ) = L) = rrx^A — r — Y^h=LPh (note that 
by Kac's theorem P(X k _ L = 0) = v i, ; cf. [7], Chapter XIII and [28], 
Section I.2.c) and sum over L. It is easy to see that 

< 



f> E£°=o h a Ph+ L Eh= L Ph ^ Eh=o h a+1 Ph 
to ^h=LPh 1 + Er=o hp h ~l + Eh= h Ph 



and we get the following estimate: 
P 



Eh=ohp 



[(2 



< 



ti'\ 1 -7"| 

10 



h=r(X k 



> 



and in turn 



P I max 

\0<fc<m-l 



< 



Ea(2m) (l- 7 )(a-l)l +E - ^ 



j-r(_2 m ) 1_7 i z v 



10rn 



,1-7 



1 



h=T 



> e 



V^OO Ld + 1 



e a( 2 m)(l-7)(a-l) 1 + ^=0^ 

and the right-hand side is summable. For a > 2 apply Markov inequality 
and Theorem 2.10 of Petrov [26] to get that 



P 



m)1 _ 7] m) 



E£U) hph+L 



< 



2C(a) 



> £ 



T(X h _ 



L 



E£Lo h a Ph+L 



ga^miX— 7)0/2 



h=LPh 
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where C(a) depends only on a. Integrating both sides, just as in the previous 
case above, we get 



E [(2-)'-l z (fc,m) J2^ =0 h Ph+T 



< 



and in turn 



max 

\0<fc<m.-l 



2C(a) Er= h a+1 Ph 

£ a 2 m(l- 7 W2 l + ^ Q hp h 



> e 



< 



2mC(«) Eh= h a+1 Ph 



> e 



£ a 2 m(l- 7 ) Q /2 1+E » o/ipft 



and the right-hand side is summable. Applying the Borel-Cantelli lemma in 
both cases one gets that 



max 

0<fe<m-l 



[(2™)^ (k,rn) Y,H=oh Ph+ 



< £ 



eventually almost surely. Since 2 m < A* < 2 m+1 for some m, we get that 

\K( x o, ■ ■ ■ ) - #A* | < e 

eventually almost surely, which since e was arbitrary gives (5). [Indeed, ob- 
serve first that for k > ip, r(Xt 00 ) = t(X$). Now for suitable k < [log A* J 



and m = [log A* J : K(X , . . . , X x 



Now we will deal with (6). For k <m define 



and 6\* 



Y^h=O hp h+T{X k _ 



Esc 
h = r(X^ ) Ph 



(k,m) 



W .(k.m)— i}" 



Clearly, for fixed k <m and /, zf\ ,m \ i > 1, are conditionally independent 
and identically distributed given r(X^ oc ) = L. Apply Hoeffding's inequality 
to get that 



P 



v T(2 m ) 1 - 7 l 7 (fc,m) 



Pl+L 



r(2 m ) 1 " 7 i 

< -(2 m ) 1_7 /2(2 m ) 27 rri 4 



h=LPh 



> {2 m )-^m- z t{X k _ 



L 
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After integrating both sides with respect to the conditioning, and using the 
sum bound on the events for < I < [~(2 m ) 7 m] , we get 



■( 



max 

V0<Z<|"(2 m ) 7 ml 



£1=1 Z ll Pl+r&*) 



< \{2 m fm\e 



-(2 m ) 1 ^/(2(2 m ) 2l m 4 ) 



> (2 m )" 7 m- 



Now 



1=0 



7 (k,m) 



v I-7-1 



> 



[(2 m ) 7 m] 
(2 m )% 2 



< \{2 m fm\e 



(2 m ) 1 -T/(2(2 m )^ 7 m 4 ) 



and 



P max 

\ 0<k<m 



< 



l^i=l 



£ 

m\(2 m ) 1 m\ 



Z 



(k,m) 



Pl+riXl^) 



\(2r 



h= T {X* x )P h 



> 



r(2 m ) 7 m] ' 
(2 m pm 2 



e ( 2 m)l-7/(2( 2 ™) 27 m 4) ' 

which is summable and so by the Borel-Cantelli lemma, 



max 

0<fc<m 



/=0 



Pl+riX^) 



\(2r 



,1-7-, 



< [(2-) 7 m] < 2 
~ (2 m pm 2 ~ m 



eventually almost surely. Since 2 m < A* < 2 m+1 for some m, 



r2 LlogA " j7 UogA*Jl-l 

(12) £ 

1=0 



P «+r(X A «) 



2=r(X A ") Fi 



< 



LlogA; 



eventually almost surely. [Indeed, observe first that for k > ip, r(X* 



V r(2 m ) 1 "Tl 7 (fc,m) 
i-ii=\ 



r{Xl). Now for suitable k < LlogA*J and m = [log A^J : ^(A" " ; 



and 



v x* 



Eoo — \ TOO 

Observe that 
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r2 LlogA « J niogA;ji-i p 



E 

1=0 



l+r(X^) 



+ E 

i= [2 L lo s A « J ^ [log A* J 1 



J2. lv x*Pi 
t=T ( x o ) 



^i=r(X A ") Fi 



By (12), A n — » almost surely. We have to prove that B n — > almost surely. 
Note that by the Markov inequality, given t(Xq) = L, for L < k, 



(13) 



E^|>£Ll0gfcJl^+ L < 1 



[log k\ 



where // L = E~z,(* - L )Pi/Y^LPi- 

Now observe that almost surely for sufficiently large n, 



(14) 
Indeed 



h n {X Q n )- rolWAsin-^i ^ [2 L lo s KK 1 -i)~\ 



p 2 Ll°gA*J(l-7)] 

< 2L lo s A nJ7 _ i 

(in the data segment Xq" there are at least |~2L 1o s a »J( 1 ~t)] zeros) and we 
have already proved that h n (XQ n ) — fj, v* — ► 0. 

Now, apply (14), (13) and the upper bound on A n in (12) in order to get 



Bn < E 

;=r2 LlogA " j7 Li°gA*ji 

f2LloKA*J-y LlogA *j-|_ 1 



P Z+t(X q A ") 
1= \2 ^ A n J 7 [log A* J] ^i=r (X A " ) W 



E 



< 1 



< 1 



5 

r2 LlogA '* J niogA*ji-i p K 



E 

Z=0 

[2 L'°s A n J T [log A* J ] - 1 



Z+T(X ") 



+ 



E 

Z=0 



^+r(X A ") 



+ 



1 



UogA*J 
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! =r2Lio g A*j 7LlogA; . J1 K=r(xp) Pi LlogA " J 

4 

< 



[lognj 

eventually almost surely, and so B n — > almost surely. The proof of Theo- 
rem 2 is complete. 

5. Proof of Theorem 3. In the proof of this theorem we do not need to 
use explicit estimates and can rely on the ergodic theorem alone. Notice that 
these renewal processes are always ergodic and therefore any finite block that 
occurs at all with positive probability will almost surely eventually occur in 
the data segment Xq with an empirical distribution which is converging to 
its probability. This observation yields the following for any fixed m: 

n— 1 

n £ V(^)<Ai^(^)-Er^^ fc+ r ( ^)/Er=r(xs)P*i< 2 - m } = 1 

almost surely. What follows is that for each m, 

^ 71—1 

J™ - E / {i^(^)-Er=o fe f fc+ .(« ) /Er=. ( x«)»i< 2 - m } = 1 

1=0 u u 

almost surely. 

To obtain the set D\ with density 1 we will construct an auxiliary sequence 
of integers N m tending to infinity as follows. For a fixed realization Xq°, let 
Nq = and for m > 1 define 



J, 



( n > N m —\ : Vi > n 
1 



> 1 - 2~( m+1) 



The existence of these N m 's follows once again from the ergodic theorem and 
since we are requiring only a countable number of conditions we may assume 
that these are satisfied simultaneously on a single set with probability 1. 
Notice that for any i > N m the number of indices j where the error we are 
making is at most 2~( m+1 ) is at least j(l — 2~( m+1 )). Using this sequence 
define the set of indexes Di(Xq°) as 

W) = 0k^: K(X ,...,X n )- E ^ kPk+ ^ <2-A 

i=l ^ l^k=r(X2)Pk > 
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By our previous observation the density of this D\ will be 1, namely: 

lim |ZMAT)n{o,i,...,n}| = L 



Furthermore, 



lim 

n£D(X™),n^oc 



n + 1 



h n {Xo, • • • , X n ) 



J2T=0 k Pk+T{X™) 



E 



k=r(X£)Pk 



0. 



For Pi{Xq) the proof proceeds along similar lines. A set D2(Xq°) is con- 
structed with density 1 along which (8) will hold and the set D in the the- 
orem is taken to be D\ n D% which has density 1. The proof of Theorem 3 
is complete. 



6. Proof of Theorem 4. Suppose that on the contrary 



P lim \h n {X Q ,...,X > 



for all binary classical renewal processes. 

We first define an auxiliary Markov chain Ai . Let the state space be the 
nonnegative integers. For i > let p^J = and pf^i j = 1. Clearly, state 
zero is positive recurrent and since the Markov chain is irreducible this 
Markov chain yields a stationary and ergodic distribution. We will modify 
this Markov chain Ai (°) in such a way that the limiting Markov chain Ai 
will remain stationary and ergodic. 

The binary classical renewal process is defined as Xn 

(i) 

Xn = 1 otherwise. Let Lq = 0. 
Now choose N\ large enough that 



if M, 



and 



P( \J{L <\ n <N 1 ,X { V = 



\n=l 



K(X!?\...,X^)-^P { S 



(0)s 



i=Q 



< 



100 



X, 



(o) 



> 1 



l 

1000 ' 



This can be done since P(X^ = 0) > and hin^^oo ^ = 1. Note that if 



X^ = 0, then 9 



(0) 



Z^i=0 l P0,i ■ 



For an arbitrary 5\ < 0.25pq q (which will be specified later) let Pq q 



,(°) 



Pb,o ~ ^i ana ^ f° r some k± > Jj-, p^fei = Po'k 1 + ^i- Now the change in 



fo) 



E^g-E^ = Mi>2. 



(o) 



i=0 
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Now choose 5\ so small such that 

.(o) 



£ |P(4 U) = *o, ... , X™ = x Nl \X^ = 0) 

{x ,...,x Nl )e{o,i} N i 

- p(x^ = x ,..., x$ = x Nl |x« = o; 

-a) 



< 



i 

1000 ' 



In this way, for the {Xn } process, for some Lq < X n < N±, the estimate 



h n {Xq , . . . , ) will be smaller than the target Y^l^o^Poi by at least 1 
with probability 1 — T^kx. 



(i) 



For an arbitrary 2Vi < L\ let Ylii>L x Pol = Pi • For an arbitrary 82 < 
0.25^5 let p'qq = p'qI — 82 and for some k 2 which will be specified later, 

PoM = p< om + ^ 2 • Now the chan g e in 

.(2) 



Z^i=Li Po,i 



and in 



00 00 

.(2) Y^J 1 ) 



i=0 



i=0 
i-2 



Choose JVi < L\ such that 3/?i < 100 . Now choose N 2 so big such that 

/ 00 



An 



1, 



^(X^ 1 ,...,X^) 



> 1 



1 V 



(i) 

0,« 



Z^i=L 1 Po 



(1) 



< 



100 



X«=0 



1000 y 

(i) 



-(I) 



Note that if AT ; , = and Xi, 

2^8=1! Po,i 

Choose /c2 so large and 82 so small such that 



X 



(i) 



1, then 9 



(l) 



fc 2 ^2 ^2 E 4 =Li «Po 



(1) 



/3i + 5 2 PiiPi+h) 



>2, 



100 2 
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and 



E \P(X [ o j =x ,...,X^= x N2 \X^ = 0) 

N 2 ) 

- P(Af } =x Q ,...,X§l= x N2 \X® = 0)| < — ^ . 



1000 J 

In this way, for the .M^ 2 ) process for some Lq < A ni < iVx and for some 

(2) v (2), 



other L\ < X n2 < N2 the estimate h n (XQ , . . . ,X^ ) will be smaller than the 
target by at least 1 with probability 1 — — ■ (Note that J2iZo ^Po\ — 
J2i^o Wo,i ■) 

Inductively, assume at stage j we have a Markov chain M^) which satis- 
fies the conditions (Cj): 

There are integers Lq < N\ < h\ < ■ ■ ■ < Nj such that 

/ 00 00 j ( 

p[ u u n ^<^<^r^ 

\ni=l n,j=li=l K 

X^ = ■ ■ ■ = X^ = 1 

^rij— Li— i+l A n ' 



h ryO") y (i) \ ^ ^=Li-A h ~ L i-i)Pol 
h nt {X Q ,...,X Xn )< — ^ — 



V°° r> KJ > 
2^h=L i ^ 1 Po,h 



x<p=o 



(15) >i-E 



3 2 



, 1000* 

1=1 



(x ,...,x Jv jG{0,l} JV J 



(16) - P(4 J) = *o, • ■ • , X%] = x Nj \X® = 0) I < ^ 
and 

(17) E^K<i+EibV 

fi,=0 /i=l 

Now we will define For an arbitrary Nj < Lj let J2i>L Pol = P- F° r 

(i) (i) (?) 

some 5 < 0.25poo an< ^ ^ which will be specified later, let p^Q =Pqq — 5 and 
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Po k = Po k + ^- Now the change in 



Y.T=L 3 (i-L 3 ) P { U l) 



v-oo 0+1) 



2^i=Lj Po,i 



kd 



T.T=l 3 ip& 



(3 + 5 P(P + 5) 



and in 



oo oo 

E^-E^i 



Now choose Lj such that 3/3 < 100 . Choose iV,-+i so big such that 



/ CO ( 



P(l)\Lj<K<N j+1 ,X^_ Lj =0,X 



U) 



\n=l K 



'^A n -L,+1 



CD 

An 



0') 



ft n (A ,...,A A )- 



2^i=L 3 Po,i 



< 



100 



x^=o 



> 1 



1000 



Note that if k > K = maxo<i<j ^ ' — then for all < i < j, 

Z-*ih=L i P 0,h 



(18) 



V°° r, (i+1) 
2^h=LiPo,h 



V°° rfi) 
2^h=Li Po,h 



Choose k > K so large and 5 so small such that 

kd 



(3 + 5 P(P + 6) 



>2, 



kS< 



1 



100J+ 1 



and 



E 



TO 



u) 



x 



U) 



3 + 1 



^Afj+i 1^0 



(?) 



(xo,...^ )e{o,i}^' 



(i+i) 



X 0,- ■■ > A JVj+i -^3+1 I 



0) 

(j+1) 



0)1 



< 



1 



1000J+ 1 

The resulting Markov chain .M^ 1 - 1 is irreducible and positive recurrent 
and so it yields a stationary and ergodic distribution and the inductive 
assumption holds for j + 1. 
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irreducible and positive recurrent and so it yields a stationary and ergodic 
distribution. Let X n = if M^ ^ = and 1 otherwise. Clearly, by the induc- 
tion and (18), 



Define p °° = linin^oopfj . The resulting Markov chain M^°°^ is clearly 

r (oo) 



(oo oo j , 

|J,...,U r\\U-l<\n t <N U X Xn _ Li =Q, 
n±=l n ? =l i=l 



X 



l. 



h ni {X ,...,X Xn .)< — R 1 

2^h=Li Po,h 



An = 



>i-E 



i=i 



1000* 



Since the set (event) is decreasing in j so 

(oo oo oo , 
U U f]\L i -i<K l <N i ,X Xni . Li =0, 
l=ln,=li=l 



A' 



= 1, 



h ni (X , ...,X Xni )< - 1 f 



An = 



and 



>i-E 



i=l 



1000 1 



The proof of Theorem 4 is complete. 



100 /l ' 



7. Proof of Theorem 5. Consider the largest L such that 
P(r(X° 00 )<L)<l-|. 
Applying the ergodic theorem we get that almost surely, 

liminf > P(r(A oo ) < L) > 1 - - > 1 - e. 



It is also clear that 



l imsup ™ ^^(A^J^L), 
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and so eventually we are predicting for finitely many blocks of l's and by 
ergodicity the consistency of the estimator hn\Xo, . . . , X ( e )) is also estab- 

( ) 

lished. Since (X^ n ) is a probability distribution, now by ergodicity its 
consistency in total variation follows immediately for the same reason and 
the proof of Theorem 5 is complete. 

Acknowledgment. We thank the referees of an earlier version for several 
helpful remarks including the reference to Petrov's book. 
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